1 Map of Sampling Stations in the ECS

Load required package:

library(marmap)

Import bathymetry data from NOAA:

b = getNOAA.bathy(lon1 = 115, lon2 = 140, lat1 = 35, lat2 = 20, resolution = 1)
## Querying NOAA database ...
## This may take seconds to minutes, depending on grid size
## Building bathy matrix ...

Import dataframe with the latitude and longitude of sampling points from CTD (in decimal degrees):

pts <- read.csv("~/Desktop/MiraiFilters/Mirai_CTD_lat_long.csv")
pts$Station<- as.character(pts$Station)
str(pts)
## 'data.frame':    14 obs. of  3 variables:
##  $ Lon    : num  126 128 127 126 125 ...
##  $ Lat    : num  26.3 25.4 25.9 26.5 25.9 ...
##  $ Station: chr  "2" "3" "4" "5" ...

Make dataframe with map labels and their lat/long:

Clon = c(127.5,131.25, 121.6, 121.03, 120.8, 126)
Clat= c(26, 31.93, 31.1, 23, 27.2, 22)
CLabels= c("Okinawa", "Japan", "China", "Taiwan", "East China Sea", "Philippine Sea")
Countries = cbind.data.frame(Clon, Clat, CLabels)
str(Countries)
## 'data.frame':    6 obs. of  3 variables:
##  $ Clon   : num  128 131 122 121 121 ...
##  $ Clat   : num  26 31.9 31.1 23 27.2 ...
##  $ CLabels: Factor w/ 6 levels "China","East China Sea",..: 4 3 1 6 2 5

Make a dataframe with vent sites:

vents <- read.csv("~/Desktop/MiraiFilters/VentSites.csv")
str(vents)
## 'data.frame':    25 obs. of  3 variables:
##  $ Long: num  122 122 123 123 123 ...
##  $ Lat : num  24.8 24.8 25.1 24.8 24.9 ...
##  $ Name: Factor w/ 25 levels "Akuseki-jima",..: 11 12 21 24 4 5 20 22 23 8 ...

Define colors to use in plot:

blues <- c("lightsteelblue4", "lightsteelblue3", "lightsteelblue2", "lightsteelblue1")
greys <- c(grey(0.6), grey(0.93), grey(0.99))
Newblues <- colorRampPalette(c("red","purple","blue",
"cadetblue1","white"))
blues2<-c("blue",
"cadetblue1", "lightsteelblue2")

Make plot one layer at a time:

tiff("MiraiFiltersMap.tiff", height= 8, width =8, units = 'in', res=300) # uncomment to save plot to file
plot(b, image = TRUE, land = TRUE, xlim=c(120,135), ylim=c(22, 32), lwd = 0.03, bpal = list(c(0, max(b), grey(.7), grey(.9), grey(.95)),
c(min(b), 0, "darkblue", "lightblue"))) 
#plot(b, image = TRUE, land = TRUE, xlim=c(120,135), ylim=c(22, 32), lwd = 0.03, bpal = Newblues(100), cex.lab=1.5, cex.axis=1.5, cex.main=1.5, cex.sub=1.5)
#bathymetry colors
plot(b, lwd = 0.8, deep = 0, shallow = 0, step = 0, add = TRUE) # highlight coastline with solid black line
plot(b, deep=-200, shallow=-200, lwd = 0.4, drawlabels=T, add=T)  # Add -200m isobath
plot(b, deep=-1000, shallow=-1000, lwd = 0.4, drawlabels=T, add=T)  # Add -1000m isobath
plot(b, deep=-2000, shallow=-2000, lwd = 0.4, drawlabels=T, add=T)  # Add -2000m isobath
plot(b, deep=-3000, shallow=-3000, lwd = 0.4, drawlabels=T, add=T)  # Add -3000m isobath
points(pts, pch = 16, col = 'red') # Add sampling points
text(pts, labels=pts$Station, adj= 1.5, cex=1.2, size = 4) # Add sampling station numbers
text(Countries, labels=Countries$CLabels, cex= 1.7, adj = -0.2) # Add map labels 
points(vents, pch = 17, col = 'blue', size = 3)
dev.off() #uncomment to save plot to file
## quartz_off_screen 
##                 2

2 Sequence analysis

2.1 Identify and Count Amplicon Sequence Variants (ASVs)

Sample collection, DNA extraction, PCR, & sequencing

Seawater collected from the Deep Chlorophyll Maximum (DCM), mid-water column, and near-bottom was collected in 10 L Niskin bottles at each sampling station and surface seawater was collected by bucket. Five Liter replicates from each depth (4.5 from surface) were sequentially filtered through 10.0 and 0.2 um pore-size Polytetrafluoroethylene (PTFE) filters. Filters were flash frozen in liquid nitrogen and stored at -80 C until DNA extraction. DNA was extracted following the manufacturers protocols for the Qiagen/MoBio DNeasy PowerWater Kit, including the optional heating step to lyse cells. Amplicon PCR and sequencing library preparation followed the procedures described in the Illumina 16S Metagenomic Sequencing Library Preparation manual modified to include universal eukaryotic primers for the v4 region of the 18S rRNA gene (F: Stoeck et al. 2010, R: Brisbin et al. 2018) and amplicon PCR conditions most appropriate for these primers (annealing at 58C). The 220 samples were separated into 4 sequencing runs on the Illumina MiSeq sequencing platform with v3 chemistry for 300x300-bp paired-end sequencing.

Bioinformatic processing with Qiime2

Demultiplexed paired-end sequences provided by the OIST sequencing center were imported to Qiime2 (v2018.11) software (Bolyen et al. 2018). The Divisive Amplicon Denoising Algorithm (DADA) was implemented with the DADA2 plug-in for Qiime2 for quality filtering, chimera removal, and feature table construction (Callahan et al. 2015) separately for each sequencing run before merging the four resulting feature tables. Taxonomy was assigned to Amplicon Sequence Variants (ASVs) in the feature table by training a classifier on the Protist Ribosomal Reference (PR2) database to classify ASVs (Guillo et al. 2012). The feature table and assigned taxonomy were then exported from Qiime2 for further analysis.

Below is an example Qiime2 workflow for 1 sequencing run:

Activate an anaconda environment for qiime2: source activate qiime2-2018.11

Import sequencing data (all sequences for eacg sequencing run must be in their own directory): qiime tools import --type 'SampleData[PairedEndSequencesWithQuality]' --input-path ./seqs/seqrun1/ --input-format CasavaOneEightSingleLanePerSampleDirFmt --output-path run1_demux-paired-end.qza

Visualize squencing results (use these graphs to decide what length to trim sequences) qiime demux summarize --i-data run1_demux-paired-end.qza --o-visualization run1_demux.qzv

Run the DADA2 algorithm: qiime dada2 denoise-paired --i-demultiplexed-seqs run1_demux-paired-end.qza --output-dir ./dd2run1 --o-representative-sequences run1_rep-seqs --p-trim-left-f 10 --p-trim-left-r 10 --p-trunc-len-f 260 --p-trunc-len-r 250 --p-n-threads 3

Check results: qiime feature-table summarize --i-table ./dd2run1/table.qza --o-visualization ./dd2run1/table.qzv --m-sample-metadata-file ~/desktop/MiraiFilters/sampleMaptxt.txt

qiime metadata tabulate --m-input-file dd2run1/denoising_stats.qza --o-visualization dd2run1/denoising_stats.qzv

Merge feature tables from multiple sequencing runs: qiime feature-table merge --i-tables ./dd2run1/table.qza --i-tables ./dd2run2/table.qza --i-tables ./dd2run3/table.qza --i-tables ./dd2run4/table.qza --o-merged-table mergedtable.qza

qiime feature-table merge-seqs --i-data run1_rep-seqs.qza --i-data run2_rep-seqs.qza --i-data run3_rep-seqs.qza --i-data run4_rep-seqs.qza --o-merged-data merged-rep-seqs.qza

Import taxonomy database: qiime tools import --type 'FeatureData[Sequence]' --input-path pr2_version_4.11.1_mothur.fasta --output-path pr2.qza

qiime tools import --type 'FeatureData[Taxonomy]' --input-format HeaderlessTSVTaxonomyFormat --input-path pr2_version_4.11.1_mothur.tax -output-path pr2_tax.qza

Select v4 region from PR2 sequences: qiime feature-classifier extract-reads --i-sequences pr2.qza --p-f-primer CCAGCASCYGCGGTAATTCC --p-r-primer ACTTTCGTTCTTGATYRA --o-reads pr2_v4.qza

Train the classifier: qiime feature-classifier fit-classifier-naive-bayes --i-reference-reads pr2_v4.qza --i-reference-taxonomy pr2_tax.qza --o-classifier pr2_v4_classifier.qza

Classify: qiime feature-classifier classify-sklearn --i-classifier pr2_v4_classifier.qza --i-reads merged-rep-seqs.qza --o-classification taxonomy.qza

qiime metadata tabulate --m-input-file taxonomy.qza --o-visualization taxonomy.qzv

Export the .tsv taxonomy file from taxonomy.qzv for use in following steps.

2.2 Load data into R & prepare phyloseq objects

Load packages:

library(tidyverse)
library(qiime2R)
library('phyloseq')
library('vegan')
library('ggplot2')
library(RColorBrewer)
library(DESeq2)
library(reshape)
library("wesanderson")
library("gridExtra")
library("metagMisc")
library("wesanderson")
library("breakaway")
library("CoDaSeq")
library("ggbiplot")

Set default colors:

cbPalette <- c("#999999", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7")
P <- brewer.pal(12, "Paired")
S2 <- brewer.pal(8, "Set2")
S1 <- brewer.pal(8, "Set1")
D2 <- brewer.pal(8, "Dark2")
colors<- c(P, S2, S1,D2)
colors2 <- rep(colors, 6)
ap<- c("#F1B6A1", "#D4A52A", "#E3E5DB", "#A5CFCC", "#0E899F", "#A83860", "#ED91BC", "#DB5339", "#F58851", "#42465C", "#1E479A", "#F7CDA4", "#CF529C", "#11638C")

Convert Qiime2 artifacts to phyloseq objects:

phyloseq<-qza_to_phyloseq(features="MiraiSeqs/mergedtable.qza")
## Warning in read_qza(features): Artifact was not generated with Qiime2
## 2018.4, if data is not successfully imported, please report here
## github.com/jbisanz/qiime2R/issues

Import sample data:

metatable <- read.csv("sampleMap.csv")
row.names(metatable) <- metatable[[1]]

metatable$Station <- as.character(metatable$Station)
metatable$filter <- as.character(metatable$filter)
metatable$VentANAHTM<-as.factor(metatable$VentANAHTM)

META<- sample_data(metatable)
str(META)
## 'data.frame':    219 obs. of  11 variables:
## Formal class 'sample_data' [package "phyloseq"] with 4 slots
##   ..@ .Data    :List of 11
##   .. ..$ : Factor w/ 219 levels "A6-F173-DNA",..: 3 6 9 12 96 97 115 118 129 134 ...
##   .. ..$ : chr  "2" "2" "2" "2" ...
##   .. ..$ : Factor w/ 4 levels "Bottom","Mid",..: 4 4 4 4 1 1 2 2 3 3 ...
##   .. ..$ : int  0 0 0 0 1000 1000 700 700 92 92 ...
##   .. ..$ : chr  "10" "2" "10" "2" ...
##   .. ..$ : Factor w/ 2 levels "n","y": 2 2 2 2 2 2 2 2 2 2 ...
##   .. ..$ : Factor w/ 4 levels "KG","MOT","NOT",..: 1 1 1 1 1 1 1 1 1 1 ...
##   .. ..$ : Factor w/ 3 levels "KG","NOT","SOT": 1 1 1 1 1 1 1 1 1 1 ...
##   .. ..$ : Factor w/ 3 levels "n","p","y": 3 3 3 3 3 3 3 3 3 3 ...
##   .. ..$ : Factor w/ 4 levels "KG","NOT","O",..: 1 1 1 1 1 1 1 1 1 1 ...
##   .. ..$ : Factor w/ 2 levels "n","y": 2 2 2 2 2 2 2 2 2 2 ...
##   ..@ names    : chr  "SampleID" "Station" "Depth" "m" ...
##   ..@ row.names: chr  "A8-F17-DNA" "B8-F18-DNA" "C8-F19-DNA" "D8-F20-DNA" ...
##   ..@ .S3Class : chr "data.frame"

Load PR2 taxonomy:

taxonomy <- read.csv("MiraiSeqs/taxonomy.csv", stringsAsFactors = FALSE)
names(taxonomy) <- c("row", "tax", "Confidence")
row.names(taxonomy) <-taxonomy[[1]]
taxonomy <- taxonomy[,(-1)]
taxonomy <-  separate(taxonomy, tax, c("D0","D1", "D2", "D3", "D4", "D5", "D6", "D7", "D8", "D9", "D10", "D11", "D12", "D13", "D14"), sep = ";", fill = "right")
taxonomy <- taxonomy[,c(1:8)]
taxmat <- as.matrix(taxonomy)
TAX = tax_table(taxmat)
str(TAX)
## Formal class 'taxonomyTable' [package "phyloseq"] with 1 slot
##   ..@ .Data: chr [1:23115, 1:8] "Eukaryota" "Eukaryota" "Eukaryota" "Eukaryota" ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:23115] "000a220fe8edd771dd82f6915627ef3e" "000c79760dd16d6692018deb8ea1b164" "000d9eaa21eaff08f34fb1a1a87601d5" "000e6f109e01181b2b1eda5d6c99319c" ...
##   .. .. ..$ : chr [1:8] "D0" "D1" "D2" "D3" ...

Add taxonomy to phyloseq object:

ps = merge_phyloseq(phyloseq, TAX, META)

2.3 Prevalence Plots

prevdf = apply(X = otu_table(ps),
               MARGIN = ifelse(taxa_are_rows(ps), yes = 1, no = 2),
               FUN = function(x){sum(x > 0)})

prevdf = data.frame(Prevalence = prevdf,
                    TotalAbundance = taxa_sums(ps),
                    tax_table(ps))

#plyr::ddply(prevdf, "D2", function(df1){cbind(mean(df1$Prevalence),sum(df1$Prevalence))})

Prevalence plot:

prevplot1<-ggplot(prevdf, aes(TotalAbundance, Prevalence / nsamples(ps),color=D2)) +
  geom_hline(yintercept = 0.05, alpha = 0.5, linetype = 2) +  geom_point(size = 2, alpha = 0.7) + 
  theme_bw()+
  scale_x_log10() +  xlab("Total Abundance") + ylab("Prevalence [Frac. Samples]") +
  facet_wrap(~D1) + theme(legend.position="none")

prevplot1

don’t apply prevalence filter

2.4 Basic statistics

OTUs <- data.frame(otu_table(ps))

total

OTUsRS<- OTUs
OTUsRS$RowSum <- rowSums(OTUsRS)
OTUsRSnoZero <- OTUsRS$RowSum!=0
sum(OTUsRSnoZero)
## [1] 23115

per sample is this number a zero? true or false –> col sums = richness

OTUs0 <- OTUs!=0
csums <- colSums(OTUs0)
csumdf <- as.data.frame(csums)
max(csumdf$csums) #1906
## [1] 1906
min(csumdf$csums) #49
## [1] 49
mean(csumdf$csums) #735
## [1] 734.789

alpha diversity plot

p<-plot_richness(ps, x = "Depth", measures = c("Observed", "Shannon"), color = "filter") + geom_boxplot() + theme_bw() + scale_color_manual(values = c("#999999", "#0072B2")) + theme(text = element_text(size=14))
p$layers <- p$layers[-1]
p

#ggsave("alphadiversity_Obs_Shann.pdf", width = 7, height =4)
ba <- breakaway(ps)
badf<- summary(ba) %>%
  add_column("SampleID" = ps %>% otu_table %>% sample_names)

badf<- merge(badf, metatable, by = "SampleID")

baPlot <- ggplot(badf, aes(x=Depth, y=estimate, fill=filter)) + geom_boxplot() + theme_bw() + scale_fill_manual(values = c("#999999", "#0072B2"), labels=c("10.0", "0.2")) + theme(text = element_text(size=14)) +ylab("Breakaway Alpha Diversity Estimate")

baPlot

#ggsave("breakaway_4depths.png", width = 6, height = 4)
bt <- betta(summary(ba)$estimate,
            summary(ba)$error,
            make_design_matrix(ps, "Depth"))
bt$table
##                   Estimates Standard Errors p-values
## (Intercept)       666.49293        20.47393    0.000
## predictorsMid     -29.21947        41.39565    0.480
## predictorsSCM     329.00654        41.44555    0.000
## predictorsSurface -18.51460        40.34140    0.646

bottom

bottomraw<- subset_samples(ps, Depth=="Bottom")
bba <- breakaway(bottomraw)
bbadf<- summary(bba) %>%
  add_column("SampleID" = bottomraw %>% otu_table %>% sample_names)

bbadf<- merge(bbadf, metatable, by = "SampleID")
bbadf$Station_f <- factor(bbadf$Station, levels=c("11", "10", "9", "8", "3", "4", "2", "5", "12", "13", "14", "15", "17", "18"))
bbaPlot <- ggplot(bbadf, aes(x=Station_f, y=estimate, fill=filter)) + geom_boxplot() + theme_bw() + scale_fill_manual(values = c("#999999", "#0072B2"), labels=c("10.0", "0.2")) + theme(text = element_text(size=14)) +ylab("Breakaway Alpha Diversity Estimate") +xlab("Station")

bbaPlot

#ggsave("breakaway_bottom.png", width = 6, height = 4)
bt <- betta(summary(bba)$estimate,
            summary(bba)$error,
            make_design_matrix(bottomraw, "VentPlume"))
bt$table
##             Estimates Standard Errors p-values
## (Intercept)  660.6267        31.74286    0.000
## predictorsp -139.8353       117.67309    0.235
## predictorsy  110.9332        83.20845    0.182
bt <- betta(summary(bba)$estimate,
            summary(bba)$error,
            make_design_matrix(bottomraw, "VentANAHTM"))
bt$table
##             Estimates Standard Errors p-values
## (Intercept)  648.7259        31.74286     0.00
## predictorsy  122.8341        83.20845     0.14

2.5 Taxonomic Filtering

  • Remove ASV’s without a D1 taxonomy assignment
  • Remove ASV’s assigned as Metazoa at D2
#table(tax_table(psN)[, "D1"], exclude = NULL)
ps<-subset_taxa(ps, D1 != "")
ps<-subset_taxa(ps, D1 != "NA")
ps<-subset_taxa(ps, D2 != "Metazoa")
OTUs <- data.frame(otu_table(ps))

is this number a zero? true or false –> col sums = richness

OTUs0 <- OTUs!=0
csums <- colSums(OTUs0)
csumdf <- as.data.frame(csums)
max(csumdf$csums) #1800
## [1] 1800
min(csumdf$csums) #45
## [1] 45
mean(csumdf$csums) #688
## [1] 688.8716

2.6 Distance and Ordination

2.6.1 All Samples

Remove incorrectly labeled samples:

mixedup <- c("F43", "F44", "F45", "F46", "F47", "F48", "F84")
'%ni%' <- Negate('%in%')
ps<- subset_samples(ps, SampleID %ni% mixedup)

2.6.1.1 Atchison distance (CLR + Euclidean)

  • compute CLR normalization, CLR = centered log-ratio, log(x/gx) where gx is the geomentric mean of vector x
  • Then Euclidean distance
  • PCoA and/or PCA
  • PERMANOVA
OTU4clr<- data.frame(t(data.frame(otu_table(ps))))
row.names(OTU4clr) <- gsub("\\.", "", row.names(OTU4clr))
OTUs.clr <- codaSeq.clr(OTU4clr + 0.5, samples.by.row=TRUE)
OTU2 <- otu_table(as.matrix(OTUs.clr), taxa_are_rows = FALSE)

row.names(metatable) <- gsub("-", "", row.names(metatable))
META2<- sample_data(metatable)

psCLR <- phyloseq(OTU2,TAX,META2)
ordu = ordinate(psCLR, "PCoA", "euclidean")
p<-plot_ordination(psCLR, ordu, color="Depth", shape="filter")+theme_bw() +scale_color_manual(values=ap)+ geom_point(size=3) +theme(text = element_text(size=16)) 
p

2.6.2 Ordination for sample subgroups

2.6.2.1 Surface + SCM

physeqSurf<- subset_samples(psCLR, Depth == "Surface" | Depth =="SCM")
ordu = ordinate(physeqSurf, "PCoA", "euclidean")
p<-plot_ordination(physeqSurf, ordu, color="Depth", shape="filter")+theme_bw() +scale_color_manual(values=colors)+ geom_point(size=3) +ggtitle("Surface &SCM - CLR/Euclidean")
p

vegan PERMANOVA - CLR

are filter differences statistically significant?

set.seed(1) #makes results reproducible, otherwise, slightly different everytime you run the test
OTUs <- data.frame(otu_table(physeqSurf))
 # filter sample data to include ONLY the samples included in this analysis. Otherwise, adonis will give an error. 
meta <- metatable[row.names(metatable) %in% row.names(OTUs),]

adonis(vegdist(OTUs, method = "euclidean")~ filter*Depth , data = meta)
## 
## Call:
## adonis(formula = vegdist(OTUs, method = "euclidean") ~ filter *      Depth, data = meta) 
## 
## Permutation: free
## Number of permutations: 999
## 
## Terms added sequentially (first to last)
## 
##               Df SumsOfSqs MeanSqs F.Model      R2 Pr(>F)  
## filter         1      3060  3060.2 0.90671 0.00852  0.565  
## Depth          1      3554  3553.5 1.05287 0.00989  0.343  
## filter:Depth   1      4897  4896.9 1.45089 0.01363  0.070 .
## Residuals    103    347634  3375.1         0.96795         
## Total        106    359145                 1.00000         
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The strongest determinant of clustering seems to be depth. To better observe clustering within depth groups, subset by depth and then filter type.

2.6.2.2 Surface 10.0

physeqSurf<- subset_samples(psCLR, Depth == "Surface" & filter == "10")
OTUsSurf10 <- data.frame(otu_table(physeqSurf))
meta <- metatable[row.names(metatable) %in% row.names(OTUsSurf10),]
Surface_10<- prcomp(OTUsSurf10)
plot(Surface_10, type="lines")

PCA_Surf10<-ggbiplot(Surface_10, var.axes = FALSE, groups= meta$Region_SKN, ellipse=TRUE ) +theme_bw() +scale_color_manual(values=colors)+ theme(text = element_text(size=14)) +  theme(legend.position="bottom") +ggtitle("Surface 10.0")
PCA_Surf10

#ggsave("PCA_Surf10.png")
set.seed(1)
adonis(vegdist(OTUsSurf10, method = "euclidean") ~ Region_SKN, data = meta)
## 
## Call:
## adonis(formula = vegdist(OTUsSurf10, method = "euclidean") ~      Region_SKN, data = meta) 
## 
## Permutation: free
## Number of permutations: 999
## 
## Terms added sequentially (first to last)
## 
##            Df SumsOfSqs MeanSqs F.Model      R2 Pr(>F)   
## Region_SKN  2      5547  2773.3  1.7617 0.12353  0.005 **
## Residuals  25     39355  1574.2         0.87647          
## Total      27     44902                 1.00000          
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
tst<-adonis_pairwise(x=meta, dd=vegdist(OTUsSurf10, method = "euclidean"), group.var="Region_SKN")
tst$Adonis.tab
##   Comparison         R2         F   df     p  p.adj
## 1     KG.NOT 0.12509963 2.5737713 1;18 0.002 0.0060
## 2     KG.SOT 0.06560932 0.9830262 1;14 0.455 0.4550
## 3    NOT.SOT 0.08497832 1.6716649 1;18 0.007 0.0105
  • KG to North - Diff
  • KG to South - NOT diff
  • North to South - Diff

2.6.2.3 Surface 0.2

physeqSurf2<- subset_samples(psCLR, Depth == "Surface" & filter == "2")
#tiff("variance_surface2.tiff", height= 8, width =8, units = 'in', res=70) # uncomment to save plot to file
OTUsSurf2 <- data.frame(otu_table(physeqSurf2))
meta <- metatable[row.names(metatable) %in% row.names(OTUsSurf2),]
surface2 <- prcomp(OTUsSurf2)
plot(surface2, type="lines")

#dev.off()
PCA_Surf2<-ggbiplot(surface2, var.axes = FALSE, groups= meta$Region_SKN, ellipse=TRUE ) +theme_bw() +scale_color_manual(values=colors)+ theme(text = element_text(size=14)) +  theme(legend.position="bottom") +ggtitle("Surface 0.2")
PCA_Surf2

#ggsave("PCA_Surf2.png")

PERMANOVA

set.seed(1)
adonis(vegdist(OTUsSurf2, method = "euclidean") ~ Region_SKN, data = meta)
## 
## Call:
## adonis(formula = vegdist(OTUsSurf2, method = "euclidean") ~ Region_SKN,      data = meta) 
## 
## Permutation: free
## Number of permutations: 999
## 
## Terms added sequentially (first to last)
## 
##            Df SumsOfSqs MeanSqs F.Model      R2 Pr(>F)    
## Region_SKN  2      8969  4484.5  2.6743 0.18224  0.001 ***
## Residuals  24     40246  1676.9         0.81776           
## Total      26     49215                 1.00000           
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Pairwise PERMANOVA

tst<-adonis_pairwise(x=meta, dd=vegdist(OTUsSurf2, method = "euclidean"), group.var="Region_SKN")
tst$Adonis.tab
##   Comparison         R2        F   df     p  p.adj
## 1     KG.NOT 0.17442004 3.802855 1;18 0.001 0.0015
## 2     KG.SOT 0.09141829 1.308014 1;13 0.151 0.1510
## 3    NOT.SOT 0.13979305 2.762686 1;17 0.001 0.0015

2.6.2.4 DCM

physeqSCM<- subset_samples(psCLR, Depth == "SCM")
ordu = ordinate(physeqSCM, "PCoA", "euclidean")
p<-plot_ordination(physeqSCM, ordu, color="Station", shape="filter")+theme_bw() +scale_color_manual(values=ap)+ geom_point(size=3)
p

Again, strong clustering by filter type.

2.6.2.5 DCM 10.0

physeqSCM10<- subset_samples(psCLR, Depth == "SCM" & filter =="10")
OTUsSCM10 <- data.frame(otu_table(physeqSCM10))
meta <- metatable[row.names(metatable) %in% row.names(OTUsSCM10),]
SCM10 <- prcomp(OTUsSCM10)
plot(SCM10, type="lines")

PCA_SCM10<-ggbiplot(SCM10, var.axes = FALSE, groups= meta$Region_SKN, ellipse=TRUE ) +theme_bw() +scale_color_manual(values=colors)+ theme(text = element_text(size=14)) +  theme(legend.position="bottom") +ggtitle("SCM 10.0")
PCA_SCM10

#ggsave("PCA_SCM10.png")

PERMANOVA

set.seed(1)
adonis(vegdist(OTUsSCM10, method = "euclidean") ~ Region_SKN, data = meta)
## 
## Call:
## adonis(formula = vegdist(OTUsSCM10, method = "euclidean") ~ Region_SKN,      data = meta) 
## 
## Permutation: free
## Number of permutations: 999
## 
## Terms added sequentially (first to last)
## 
##            Df SumsOfSqs MeanSqs F.Model      R2 Pr(>F)  
## Region_SKN  2     10020  5009.8  1.4684 0.10902  0.011 *
## Residuals  24     81884  3411.9         0.89098         
## Total      26     91904                 1.00000         
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

pairwise PERMANOVA

tst<-adonis_pairwise(x=meta, dd=vegdist(OTUsSCM10, method = "euclidean"), group.var="Region_SKN")
tst$Adonis.tab
##   Comparison         R2        F   df     p  p.adj
## 1     KG.NOT 0.09160135 1.714251 1;17 0.010 0.0285
## 2     KG.SOT 0.08413800 1.194278 1;13 0.174 0.1740
## 3    NOT.SOT 0.07617784 1.484270 1;18 0.019 0.0285
  • KG and NOT = diff
  • N & S = Diff

2.6.2.6 DCM 0.2

physeqSCM2<- subset_samples(psCLR, Depth == "SCM" & filter == "2")
OTUsSCM2 <- data.frame(otu_table(physeqSCM2))
meta <- metatable[row.names(metatable) %in% row.names(OTUsSCM2),]
SCM2 <- prcomp(OTUsSCM2)
plot(SCM2, type="lines")

PCA_SCM2<-ggbiplot(SCM2, var.axes = FALSE, groups= meta$Region_SKN, ellipse=TRUE ) +theme_bw() +scale_color_manual(values=colors)+ theme(text = element_text(size=14)) +  theme(legend.position="bottom") +ggtitle("SCM 0.2")
PCA_SCM2

#ggsave("PCA_SCM2.png")

PERMANOVA

set.seed(1)
adonis(vegdist(OTUsSCM2, method = "euclidean") ~ Region_SKN, data = meta)
## 
## Call:
## adonis(formula = vegdist(OTUsSCM2, method = "euclidean") ~ Region_SKN,      data = meta) 
## 
## Permutation: free
## Number of permutations: 999
## 
## Terms added sequentially (first to last)
## 
##            Df SumsOfSqs MeanSqs F.Model      R2 Pr(>F)
## Region_SKN  2      6676  3337.9  1.1825 0.09707  0.153
## Residuals  22     62100  2822.7         0.90293       
## Total      24     68776                 1.00000

2.6.2.7 Mid

physeqMid<- subset_samples(psCLR, Depth == "Mid")
ordu = ordinate(physeqMid, "PCoA", "euclidean")
p<-plot_ordination(physeqMid, ordu, color="Station", shape = "filter")+theme_bw() +scale_color_manual(values=ap)+ geom_point(size=3)
p + theme(legend.position="bottom") +ggtitle("Euclidean dist with CLR, Mid")

Interesting …. Filter effect shows up in CLR/Euc

Split by filter…

2.6.2.8 Mid 10.0

physeqMid10<- subset_samples(psCLR, Depth == "Mid" & filter == "10")
OTUsMid10 <- data.frame(otu_table(physeqMid10))
meta <- metatable[row.names(metatable) %in% row.names(OTUsMid10),]
Mid10 <- prcomp(OTUsMid10)
plot(Mid10, type="lines")

PCA_Mid10<-ggbiplot(Mid10, var.axes = FALSE, groups= meta$Region_SKN, ellipse=TRUE ) +theme_bw() +scale_color_manual(values=colors)+ theme(text = element_text(size=14)) +  theme(legend.position="bottom") +ggtitle("Mid 10.0")
PCA_Mid10

ggsave("PCA_Mid10.png")
## Saving 8 x 6.5 in image

PERMANOVA by Region

adonis(vegdist(OTUsMid10, method = "euclidean") ~ Region_SKN, data = meta, permutations = 999)
## 
## Call:
## adonis(formula = vegdist(OTUsMid10, method = "euclidean") ~ Region_SKN,      data = meta, permutations = 999) 
## 
## Permutation: free
## Number of permutations: 999
## 
## Terms added sequentially (first to last)
## 
##            Df SumsOfSqs MeanSqs F.Model      R2 Pr(>F)  
## Region_SKN  2      4934  2466.8  1.2673 0.09926  0.079 .
## Residuals  23     44769  1946.5         0.90074         
## Total      25     49703                 1.00000         
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Not Significant!

2.6.2.9 Mid 0.2

2.6.2.9.1 CLR
physeqMid2<- subset_samples(psCLR, Depth == "Mid" & filter == "2")
OTUsMid2 <- data.frame(otu_table(physeqMid2))
meta <- metatable[row.names(metatable) %in% row.names(OTUsMid2),]
Mid2 <- prcomp(OTUsMid2)
plot(Mid2, type="lines")

PCA_Mid2<-ggbiplot(Mid2, var.axes = FALSE, groups= meta$Region_SKN, ellipse=TRUE ) +theme_bw() +scale_color_manual(values=colors)+ theme(text = element_text(size=14)) +  theme(legend.position="bottom") +ggtitle("Mid 0.2")
PCA_Mid2

ggsave("PCA_Mid2.png")
## Saving 8 x 6.5 in image

PERMANOVA by Region

set.seed(1)
adonis(vegdist(OTUsMid2, method = "euclidean") ~ Region_SKN, data = meta)
## 
## Call:
## adonis(formula = vegdist(OTUsMid2, method = "euclidean") ~ Region_SKN,      data = meta) 
## 
## Permutation: free
## Number of permutations: 999
## 
## Terms added sequentially (first to last)
## 
##            Df SumsOfSqs MeanSqs F.Model      R2 Pr(>F)    
## Region_SKN  2      6883  3441.3  1.6878 0.13302  0.001 ***
## Residuals  22     44857  2038.9         0.86698           
## Total      24     51739                 1.00000           
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

pairwise PERMANOVA

tst<-adonis_pairwise(x=meta, dd=vegdist(OTUsMid2, method = "euclidean"), group.var="Region_SKN")
tst$Adonis.tab
##   Comparison         R2        F   df     p  p.adj
## 1     KG.NOT 0.12483288 2.139583 1;15 0.009 0.0270
## 2     KG.SOT 0.11303007 1.401773 1;11 0.062 0.0620
## 3    NOT.SOT 0.07811164 1.525141 1;18 0.019 0.0285

North to South = diff KG to North = diff

2.6.2.10 Bottom

2.6.2.10.1 CLR
physeqBottom<- subset_samples(psCLR, Depth == "Bottom")
ordu = ordinate(physeqBottom, "PCoA", "euclidean")
p<-plot_ordination(physeqBottom, ordu, color="Station", shape = "filter")+theme_bw() +scale_color_manual(values=ap)+ geom_point(size=3)
p + theme(legend.position="bottom") +ggtitle("Euclidean dist with CLR, Bottom")

No clustering by filter type.

For consistency, split by filter type for the NOT/SOT/KG biogeography:

2.6.2.11 Bottom 10.0

physeqBottom10<- subset_samples(psCLR, Depth == "Bottom" & filter == "10")
OTUsBottom10 <- data.frame(otu_table(physeqBottom10))
meta <- metatable[row.names(metatable) %in% row.names(OTUsBottom10),]
Bottom10 <- prcomp(OTUsBottom10)
plot(Bottom10, type="lines")

PCA_Bottom10<-ggbiplot(Bottom10, var.axes = FALSE, groups= meta$Region_SKN, ellipse=TRUE ) +theme_bw() +scale_color_manual(values=colors)+ theme(text = element_text(size=14)) +  theme(legend.position="bottom") +ggtitle("Bottom 10.0")
PCA_Bottom10

PERMANOVA by Region

set.seed(1)
adonis(vegdist(OTUsBottom10, method = "euclidean") ~ Region_SKN , data = meta)
## 
## Call:
## adonis(formula = vegdist(OTUsBottom10, method = "euclidean") ~      Region_SKN, data = meta) 
## 
## Permutation: free
## Number of permutations: 999
## 
## Terms added sequentially (first to last)
## 
##            Df SumsOfSqs MeanSqs F.Model      R2 Pr(>F)  
## Region_SKN  2      5815  2907.6  1.2317 0.09309  0.068 .
## Residuals  24     56656  2360.7         0.90691         
## Total      26     62471                 1.00000         
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

not significant

2.6.2.12 Bottom 0.2

physeqBottom2<- subset_samples(psCLR, Depth == "Bottom" & filter == "2")
OTUsBottom2 <- data.frame(otu_table(physeqBottom2))
meta <- metatable[row.names(metatable) %in% row.names(OTUsBottom2),]
Bottom2 <- prcomp(OTUsBottom2)
plot(Bottom2, type="lines")

PCA_Bottom2<-ggbiplot(Bottom2, var.axes = FALSE, groups= meta$Region_SKN, ellipse=TRUE ) +theme_bw() +scale_color_manual(values=colors)+ theme(text = element_text(size=14)) +  theme(legend.position="bottom") +ggtitle("Bottom 0.2")
PCA_Bottom2

PERMANOVA by region

set.seed(1)
adonis(vegdist(OTUsBottom2, method = "euclidean") ~ Region_SKN, data = meta)
## 
## Call:
## adonis(formula = vegdist(OTUsBottom2, method = "euclidean") ~      Region_SKN, data = meta) 
## 
## Permutation: free
## Number of permutations: 999
## 
## Terms added sequentially (first to last)
## 
##            Df SumsOfSqs MeanSqs F.Model     R2 Pr(>F)   
## Region_SKN  2      7126  3563.2   1.512 0.1162  0.007 **
## Residuals  23     54200  2356.5         0.8838          
## Total      25     61327                 1.0000          
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Pairwise PERMANOVA

tst<-adonis_pairwise(x=meta, dd=vegdist(OTUsBottom2, method = "euclidean"), group.var="Region_SKN")
tst$Adonis.tab
##   Comparison         R2        F   df     p p.adj
## 1     KG.NOT 0.06910372 1.187737 1;16 0.165 0.165
## 2     KG.SOT 0.09775075 1.408435 1;13 0.030 0.045
## 3    NOT.SOT 0.10568592 2.008982 1;17 0.001 0.003

For Vent v. Plume - use 10.0 and 0.2 samples together - needed bc low number of vent sites compared to non vent sites PERMANOVA

set.seed(1)
physeqBottom<- subset_samples(psCLR, Depth == "Bottom")
OTUsBottom <- data.frame(otu_table(physeqBottom))
meta <- metatable[row.names(metatable) %in% row.names(OTUsBottom),]
adonis(vegdist(OTUsBottom, method = "euclidean") ~ VentPlume, data = meta)
## 
## Call:
## adonis(formula = vegdist(OTUsBottom, method = "euclidean") ~      VentPlume, data = meta) 
## 
## Permutation: free
## Number of permutations: 999
## 
## Terms added sequentially (first to last)
## 
##           Df SumsOfSqs MeanSqs F.Model     R2 Pr(>F)    
## VentPlume  2      9303  4651.6  1.9195 0.0713  0.001 ***
## Residuals 50    121170  2423.4         0.9287           
## Total     52    130473                 1.0000           
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

pairwise PERMANOVA

tst<-adonis_pairwise(x=meta, dd=vegdist(OTUsBottom, method = "euclidean"), group.var="VentPlume")
tst$Adonis.tab
##   Comparison         R2         F   df     p p.adj
## 1        n.p 0.03681769 1.6436771 1;43 0.012 0.018
## 2        n.y 0.04858874 2.4002983 1;47 0.001 0.003
## 3        p.y 0.08956238 0.9837289 1;10 0.432 0.432
Bottom <- prcomp(OTUsBottom)
plot(Bottom, type="lines")

PCA_Bottom<-ggbiplot(Bottom, var.axes = FALSE, groups= meta$VentPlume, ellipse=TRUE ) +theme_bw() +scale_color_manual(values= c(wes_palette("Cavalcanti1")[4], wes_palette("Cavalcanti1")[1], wes_palette("Cavalcanti1")[5] ))+ theme(text = element_text(size=14)) +  theme(legend.position="bottom") +ggtitle("Bottom")
PCA_Bottom

2.7 Merge Samples –> collapse replicates (for Relative Abundance Plots)

Prepare metadata table:

metatable <- read.csv("sampleMap.csv")
row.names(metatable) <- metatable[[1]]
metatable$desc <- paste(metatable$Station, metatable$Depth, metatable$filter, sep="-")
metatable <- metatable[metatable$SampleID %ni% mixedup,]
metatable$Depth <- as.character(metatable$Depth)
metatable$Station <-as.character(metatable$Station)
metatable$filter<-as.character(metatable$filter)
metatable<- metatable[,-which(names(metatable) == "SampleID")]
META<- sample_data(metatable)

ps<- subset_samples(ps, SampleID %ni% mixedup)
ps2<- ps
sample_data(ps2) <- META
print(ps2)
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 19923 taxa and 211 samples ]
## sample_data() Sample Data:       [ 211 samples by 11 sample variables ]
## tax_table()   Taxonomy Table:    [ 19923 taxa by 8 taxonomic ranks ]
str(sample_data(ps2))
## 'data.frame':    211 obs. of  11 variables:
## Formal class 'sample_data' [package "phyloseq"] with 4 slots
##   ..@ .Data    :List of 11
##   .. ..$ : chr  "11" "10" "10" "10" ...
##   .. ..$ : chr  "Mid" "Bottom" "Mid" "SCM" ...
##   .. ..$ : int  700 1510 700 85 85 85 700 700 1510 1510 ...
##   .. ..$ : chr  "2" "2" "10" "2" ...
##   .. ..$ : Factor w/ 2 levels "n","y": 2 2 2 2 2 2 2 2 2 2 ...
##   .. ..$ : Factor w/ 4 levels "KG","MOT","NOT",..: 4 4 4 4 4 4 4 4 4 4 ...
##   .. ..$ : Factor w/ 3 levels "KG","NOT","SOT": 3 3 3 3 3 3 3 3 3 3 ...
##   .. ..$ : Factor w/ 3 levels "n","p","y": 2 3 3 3 3 3 3 3 3 3 ...
##   .. ..$ : Factor w/ 4 levels "KG","NOT","O",..: 4 4 4 4 4 4 4 4 4 4 ...
##   .. ..$ : Factor w/ 2 levels "n","y": 1 2 2 2 2 2 2 2 2 2 ...
##   .. ..$ : chr  "11-Mid-2" "10-Bottom-2" "10-Mid-10" "10-SCM-2" ...
##   ..@ names    : chr  "Station" "Depth" "m" "filter" ...
##   ..@ row.names: chr  "F142" "F150" "F151" "F154" ...
##   ..@ .S3Class : chr "data.frame"

Merge:

mergedps <- merge_samples(ps2, "desc")
print(mergedps)
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 19923 taxa and 112 samples ]
## sample_data() Sample Data:       [ 112 samples by 11 sample variables ]
## tax_table()   Taxonomy Table:    [ 19923 taxa by 8 taxonomic ranks ]

number of samples reduced from 211 to 112

sample_names(mergedps)
##   [1] "10-Bottom-10"  "10-Bottom-2"   "10-Mid-10"     "10-Mid-2"     
##   [5] "10-SCM-10"     "10-SCM-2"      "10-Surface-10" "10-Surface-2" 
##   [9] "11-Bottom-10"  "11-Bottom-2"   "11-Mid-10"     "11-Mid-2"     
##  [13] "11-SCM-10"     "11-SCM-2"      "11-Surface-10" "11-Surface-2" 
##  [17] "12-Bottom-10"  "12-Bottom-2"   "12-Mid-10"     "12-Mid-2"     
##  [21] "12-SCM-10"     "12-SCM-2"      "12-Surface-10" "12-Surface-2" 
##  [25] "13-Bottom-10"  "13-Bottom-2"   "13-Mid-10"     "13-Mid-2"     
##  [29] "13-SCM-10"     "13-SCM-2"      "13-Surface-10" "13-Surface-2" 
##  [33] "14-Bottom-10"  "14-Bottom-2"   "14-Mid-10"     "14-Mid-2"     
##  [37] "14-SCM-10"     "14-SCM-2"      "14-Surface-10" "14-Surface-2" 
##  [41] "15-Bottom-10"  "15-Bottom-2"   "15-Mid-10"     "15-Mid-2"     
##  [45] "15-SCM-10"     "15-SCM-2"      "15-Surface-10" "15-Surface-2" 
##  [49] "17-Bottom-10"  "17-Bottom-2"   "17-Mid-10"     "17-Mid-2"     
##  [53] "17-SCM-10"     "17-SCM-2"      "17-Surface-10" "17-Surface-2" 
##  [57] "18-Bottom-10"  "18-Bottom-2"   "18-Mid-10"     "18-Mid-2"     
##  [61] "18-SCM-10"     "18-SCM-2"      "18-Surface-10" "18-Surface-2" 
##  [65] "2-Bottom-10"   "2-Bottom-2"    "2-Mid-10"      "2-Mid-2"      
##  [69] "2-SCM-10"      "2-SCM-2"       "2-Surface-10"  "2-Surface-2"  
##  [73] "3-Bottom-10"   "3-Bottom-2"    "3-Mid-10"      "3-Mid-2"      
##  [77] "3-SCM-10"      "3-SCM-2"       "3-Surface-10"  "3-Surface-2"  
##  [81] "4-Bottom-10"   "4-Bottom-2"    "4-Mid-10"      "4-Mid-2"      
##  [85] "4-SCM-10"      "4-SCM-2"       "4-Surface-10"  "4-Surface-2"  
##  [89] "5-Bottom-10"   "5-Bottom-2"    "5-Mid-10"      "5-Mid-2"      
##  [93] "5-SCM-10"      "5-SCM-2"       "5-Surface-10"  "5-Surface-2"  
##  [97] "8-Bottom-10"   "8-Bottom-2"    "8-Mid-10"      "8-Mid-2"      
## [101] "8-SCM-10"      "8-SCM-2"       "8-Surface-10"  "8-Surface-2"  
## [105] "9-Bottom-10"   "9-Bottom-2"    "9-Mid-10"      "9-Mid-2"      
## [109] "9-SCM-10"      "9-SCM-2"       "9-Surface-10"  "9-Surface-2"

But metatable inside phyloseq object was disturbed …

str(sample_data(mergedps))
## 'data.frame':    112 obs. of  11 variables:
## Formal class 'sample_data' [package "phyloseq"] with 4 slots
##   ..@ .Data    :List of 11
##   .. ..$ : num  10 10 10 10 10 10 10 10 11 11 ...
##   .. ..$ : num  NA NA NA NA NA NA NA NA NA NA ...
##   .. ..$ : num  1510 1510 700 700 85 85 0 0 1210 1210 ...
##   .. ..$ : num  10 2 10 2 10 2 10 2 10 2 ...
##   .. ..$ : num  2 2 2 2 2 2 2 2 2 2 ...
##   .. ..$ : num  4 4 4 4 4 4 4 4 4 4 ...
##   .. ..$ : num  3 3 3 3 3 3 3 3 3 3 ...
##   .. ..$ : num  3 3 3 3 3 3 3 3 2 2 ...
##   .. ..$ : num  4 4 4 4 4 4 4 4 4 4 ...
##   .. ..$ : num  2 2 2 2 2 2 2 2 1 1 ...
##   .. ..$ : num  NA NA NA NA NA NA NA NA NA NA ...
##   ..@ names    : chr  "Station" "Depth" "m" "filter" ...
##   ..@ row.names: chr  "10-Bottom-10" "10-Bottom-2" "10-Mid-10" "10-Mid-2" ...
##   ..@ .S3Class : chr "data.frame"

Fix meta table in phyloseq object:

meta2<-as.data.frame(sample_data(mergedps))
split<- do.call(rbind, strsplit(row.names(meta2), "-"))
meta2$Depth <- split[,2]
meta2$filter<-split[,3]
meta2$desc <- row.names(meta2)
meta2$Station <- as.character(meta2$Station)
meta2$Depth_f <- factor(meta2$Depth, levels=c('Surface','SCM','Mid','Bottom'))
meta2$Station_f <- factor(meta2$Station, levels=c("11", "10", "9", "8", "3", "4", "2", "5", "12", "13", "14", "15", "17", "18"))
META <-sample_data(meta2)
sample_data(mergedps)<-META

Fixed!

str(sample_data(mergedps))
## 'data.frame':    112 obs. of  13 variables:
## Formal class 'sample_data' [package "phyloseq"] with 4 slots
##   ..@ .Data    :List of 13
##   .. ..$ : chr  "10" "10" "10" "10" ...
##   .. ..$ : chr  "Bottom" "Bottom" "Mid" "Mid" ...
##   .. ..$ : num  1510 1510 700 700 85 85 0 0 1210 1210 ...
##   .. ..$ : chr  "10" "2" "10" "2" ...
##   .. ..$ : num  2 2 2 2 2 2 2 2 2 2 ...
##   .. ..$ : num  4 4 4 4 4 4 4 4 4 4 ...
##   .. ..$ : num  3 3 3 3 3 3 3 3 3 3 ...
##   .. ..$ : num  3 3 3 3 3 3 3 3 2 2 ...
##   .. ..$ : num  4 4 4 4 4 4 4 4 4 4 ...
##   .. ..$ : num  2 2 2 2 2 2 2 2 1 1 ...
##   .. ..$ : chr  "10-Bottom-10" "10-Bottom-2" "10-Mid-10" "10-Mid-2" ...
##   .. ..$ : Factor w/ 4 levels "Surface","SCM",..: 4 4 3 3 2 2 1 1 4 4 ...
##   .. ..$ : Factor w/ 14 levels "11","10","9",..: 2 2 2 2 2 2 2 2 1 1 ...
##   ..@ names    : chr  "Station" "Depth" "m" "filter" ...
##   ..@ row.names: chr  "10-Bottom-10" "10-Bottom-2" "10-Mid-10" "10-Mid-2" ...
##   ..@ .S3Class : chr "data.frame"

Taxa glom - for clean RA plots

mergedps<- subset_taxa(mergedps, D2 != "Metazoa")
mergedps<-subset_taxa(mergedps, D1 != "")
mergedps<-subset_taxa(mergedps, D1 != "NA")

mergedps = tax_glom(mergedps, "D2")
LowPrev<- c("Alveolata_X", "Amoebozoa_X", "Apicomplexa", "Apusomonadidae", "Conosa", "Discoba", "Eukaryota_XX", "Foraminifera", "Hilomonadea","Fungi", "Katablepharidophyta", "Lobosa", "Mesomycetozoa", "Metamonada", "Metazoa", "Opisthokonta_X", "Perkinsea", "Streptophyta", "Rhodophyta","Telonemia", "")
mergedHighPrev<- subset_taxa(mergedps, D2 %ni% LowPrev)
mergedRAhp <- transform_sample_counts(mergedHighPrev, function(OTU) 100* OTU/sum(OTU))

ap<- c("#F1B6A1", "#F58851", "#E3E5DB", "#A5CFCC", "#11638C", "#A83860","#D4A52A", "#CF529C", "#1E479A", "#F7CDA4", "#42465C",  "#0E899F", "#DB5339")


taxabarplot<-plot_bar(mergedRAhp, x= "Station_f", fill = "D2", facet_grid=filter~Depth_f) + scale_y_continuous(expand = c(0, 0)) + ggtitle("") + scale_fill_manual(values= ap) + theme(legend.title=element_blank()) + geom_bar(aes(fill=D2), stat="identity", position="stack", width =0.9) +theme_classic() + theme(text = element_text(size=14))+theme(axis.text.x = element_text(angle = 90)) + xlab("Station") +ylab("Relative Abundance(%)")

taxaplot_nolegend <- taxabarplot + theme(legend.position="none")

taxaplot_nolegend

Dj1<- wes_palette("Darjeeling1")
Dj<- wes_palette("Darjeeling2")
Cv <- wes_palette("Cavalcanti1")
Gb<- wes_palette("GrandBudapest1")
Gb2<- wes_palette("GrandBudapest2")
Br<- wes_palette("BottleRocket2")
Rm<- wes_palette("Rushmore1")
R1 <- wes_palette("Royal1")
R2<-wes_palette("Royal2")
mr3<- wes_palette("Moonrise3")
C1<- wes_palette("Chevalier1")
wpcolor<- c( Dj1[1:4], "blue", Dj[1:4], R2[3],Cv[1:3],Br[1:4], Rm[2:4], Gb2,mr3,Gb[2:4], "white" )

wpcolor[17]<- "forestgreen"
wpcolor[26]<-"grey"
wpcolor[21]<-"black"


mergedRA <- transform_sample_counts(mergedps, function(OTU) 100* OTU/sum(OTU))

taxabarplot<-plot_bar(mergedRA, x= "Station_f", fill = "D2", facet_grid=filter~Depth_f) + scale_y_continuous(expand = c(0, 0)) + ggtitle("") + scale_fill_manual(values=wpcolor) + theme(legend.title=element_blank()) + geom_bar(aes(fill=D2), stat="identity", position="stack", width =0.9) +theme_classic() + theme(text = element_text(size=14))+theme(axis.text.x = element_text(angle = 90)) + xlab("Station") +ylab("Relative Abundance(%)")

taxaplot_nolegend <- taxabarplot + theme(legend.position="none")

taxaplot_nolegend

ggsave("taxaplot_nolegend.pdf", width = 8, height = 5)
g_legend<-function(a.gplot){
  tmp <- ggplot_gtable(ggplot_build(a.gplot))
  leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
  legend <- tmp$grobs[[leg]]
  legend
}
legend <- g_legend(taxabarplot)
plegend <- grid.arrange(legend)

wpcolor2<- c(Br[1:3], Rm[3:5], R1[1:2], C1[1], C1[3:4])
taxabarplotD1<-plot_bar(mergedRA, x= "Station_f", fill = "D1", facet_grid=filter~Depth_f) + scale_y_continuous(expand = c(0, 0)) + ggtitle("") + scale_fill_manual(values=wpcolor2) + theme(legend.title=element_blank()) + geom_bar(aes(fill=D1), stat="identity", position="stack", width =0.9) +theme_classic() + theme(text = element_text(size=14))+theme(axis.text.x = element_text(angle = 90)) + xlab("Station") +ylab("Relative Abundance(%)")

taxaplotD1_nolegend <- taxabarplotD1 + theme(legend.position="none")

taxaplotD1_nolegend

g_legend<-function(a.gplot){
  tmp <- ggplot_gtable(ggplot_build(a.gplot))
  leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
  legend <- tmp$grobs[[leg]]
  legend
}
legend <- g_legend(taxabarplotD1)
D1legend <- grid.arrange(legend)

BottomTaxa<-subset_samples(mergedRA, Depth == "Bottom")

Bottombarplot<-plot_bar(BottomTaxa, x= "Station_f", fill = "D2", facet_grid=~filter) + scale_y_continuous(expand = c(0, 0)) + ggtitle("") + scale_fill_manual(values=wpcolor) + theme(legend.title=element_blank()) + geom_bar(aes(fill=D2), stat="identity", position="stack", width =0.9) +theme_classic() + theme(text = element_text(size=14))+theme(axis.text.x = element_text(angle = 90)) + xlab("Station") +ylab("Relative Abundance(%)")

Bottombarplot_nolegend <- Bottombarplot+ theme(legend.position="none")

Bottombarplot_nolegend

ggsave("Bottombarplot_nolegend.pdf", width=8, height =4)

2.8 DESeq2 – Differential abundance testing

subset rawcount phyloseq object to just hold bottom samples

bottomraw<- subset_samples(ps, Depth=="Bottom")

First, only station 2 and station 10 are considered vents

ddse <- phyloseq_to_deseq2(bottomraw, ~VentANAHTM)
## converting counts to integer mode
ddse2 <- DESeq(ddse, test="Wald", fitType="parametric")
res<- results(ddse2,  cooksCutoff = FALSE)
alpha = 0.05
sigtab = res[which(res$padj < alpha), ]
head(sigtab)
## log2 fold change (MLE): VentANAHTM y vs n 
## Wald test p-value: VentANAHTM y vs n 
## DataFrame with 6 rows and 6 columns
##                                          baseMean    log2FoldChange
##                                         <numeric>         <numeric>
## 0595b46cbc5ccfb36ee584aa4bba1ab2 1.47374676747638  23.0718511423123
## 062eda4af9bf284c4f90a12a56c70528 15.0330428805362 -20.3280575624619
## 1999b708a0fc267d4b2689f846c8fc5d 43.4377717005511 -22.7039460856944
## 264bcfe948e3249d0351741f0ff2de89 17.1218935691345 -22.1527615190764
## 2d606b975a788dd2e2d30370526924cb 11.8913737552474 -22.1583793820037
## 4147786eeaf2d8e0def7eaff9e82ac85 9.92472807723299  25.7320482337882
##                                             lfcSE              stat
##                                         <numeric>         <numeric>
## 0595b46cbc5ccfb36ee584aa4bba1ab2 4.04436482692962  5.70469087968701
## 062eda4af9bf284c4f90a12a56c70528 2.90826082281126 -6.98976426151895
## 1999b708a0fc267d4b2689f846c8fc5d 4.09406649683235 -5.54557335677397
## 264bcfe948e3249d0351741f0ff2de89 2.84279000952778 -7.79261269556672
## 2d606b975a788dd2e2d30370526924cb 3.17412268665862 -6.98094609737021
## 4147786eeaf2d8e0def7eaff9e82ac85 4.04178874952267  6.36650003957458
##                                                pvalue                 padj
##                                             <numeric>            <numeric>
## 0595b46cbc5ccfb36ee584aa4bba1ab2 1.16554415268086e-08 2.28228114396821e-06
## 062eda4af9bf284c4f90a12a56c70528 2.75348510517914e-12 1.02065760083012e-09
## 1999b708a0fc267d4b2689f846c8fc5d 2.92992462963282e-08 5.39967874390567e-06
## 264bcfe948e3249d0351741f0ff2de89 6.56375825159134e-15 3.42737576703928e-12
## 2d606b975a788dd2e2d30370526924cb 2.93198800110791e-12 1.02065760083012e-09
## 4147786eeaf2d8e0def7eaff9e82ac85 1.93390250474416e-10 4.66070503643343e-08
dim(sigtab)
## [1] 24  6
sigtab = cbind(as(sigtab, "data.frame"), as(tax_table(bottomraw)[rownames(sigtab), ], "matrix"))

AND if station 11 is a vent … First, only station 2 and station 10 are considered vents

ddse <- phyloseq_to_deseq2(bottomraw, ~Vent)
## converting counts to integer mode
ddse2 <- DESeq(ddse, test="Wald", fitType="parametric")
## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
## -- replacing outliers and refitting for 3726 genes
## -- DESeq argument 'minReplicatesForReplace' = 7 
## -- original counts are preserved in counts(dds)
## estimating dispersions
## fitting model and testing
res<- results(ddse2,  cooksCutoff = FALSE)
alpha = 0.05
sigtab = res[which(res$padj < alpha), ]
dim(sigtab)
## [1] 70  6

dim(sigtab[sigtab$log2FoldChange >0,])

sigtab = cbind(as(sigtab, "data.frame"), as(tax_table(bottomraw)[rownames(sigtab), ], "matrix"))
#sigtab$log2FoldChange <- sigtab$log2FoldChange * -1
x = tapply(sigtab$log2FoldChange, sigtab$D2, function(x) max(x))
x = sort(x, TRUE)
sigtab$D2 = factor(as.character(sigtab$D2), levels=names(x))
#D3 level
x = tapply(sigtab$log2FoldChange, sigtab$D3, function(x) max(x))
x = sort(x, TRUE)
sigtab$D3 = factor(as.character(sigtab$D3), levels=names(x))
# D4 level
x = tapply(sigtab$log2FoldChange, sigtab$D4, function(x) max(x))
x = sort(x, TRUE)
sigtab$D4 = factor(as.character(sigtab$D4), levels=names(x))
#plot F0
sigplot<- ggplot(sigtab, aes(x=D2, y=log2FoldChange, color = D3)) + geom_point(size=5, alpha = 0.7) + theme(legend.title=element_blank()) + theme_bw() +
   ggtitle(" ") +
  theme(axis.text.x = element_text(angle = -45, hjust = 0, vjust=0.75)) + scale_color_manual(values=colors) + theme(text = element_text(size=16))
noLegendsigplot <- sigplot +theme(legend.position = "none")
noLegendsigplot
## Warning: Removed 1 rows containing missing values (geom_point).

#ggsave("sigplotnoLegend.png", height = 4, width = 6)
sigplot
## Warning: Removed 1 rows containing missing values (geom_point).

#ggsave("sigplot.png")